Wie ist die Medienresonanz von Pressemittelungen politischer Parteien?

Assumptions:

  1. Parteien wollen “ihre” Themen in den Medien platzieren, d.h. die Themen, die den - meist programmatisch bestimmten - Kern ihrer Wahlaussage bilden.

  2. Parteien wollen Probleme in der Vordergrund rücken, für die sie nach Ansicht der Bevölkerung insgesamt oder nach Ansicht des eigenen Anhangs die Lösungskompentenz besitzen.

  3. Parteien wollen Themen vermeiden, die aufgrund der aktuellen Sachlage gegen sie sprechen. Stattdessen wollen sie andere Themen (Sachthemen, Personal- und Stilfragen) in den Vordergrund rücken. Instrument hierfür sind Pressemitteilungen der Parteien und Fraktionen.

  4. Parteien möchten, dass ihre Sichtweisen möglichst ungekürzt und unverfälscht publiziert werden.

Parties and candidates not only want to be present in the media (coverage bias), or evaluated in a positive way (tonality bias). They also want the media agenda to be congruent with their own agenda to define the issue-based criteria on which they will be evaluated by voters. Thus, parties choose their issue agenda carefully, highlighting issues that they are perceived to be competent on, that they “own” and that are important to their voters. In that sense agenda bias refers to the extent to which political actors appear in the public domain in conjunction with the topics they wish to emphasize.

To allow for an operationalization of agenda bias, I use parties’ campaign communication as an approximation of the potential universe of news stories (D’Alessio & Allen, 2000; Eberl, 2017). I compare the policy issues addressed in campaign communication (i.e., the party agenda) with the policy issues the parties address in media coverage (i.e., the mediated party agenda).

To discover the latent topics in the corpus of press releases (1.942) and news articles (11.880), a structural topic modeling (STM) developed by Roberts (2016) is applied. The STM is an unsupervised machine learning approach that models topics as multinomial distributions of words and documents as multinomial distributions of topics, allowing to incorporate external variables that effect both, topical content and topical prevalence.

Structural Topic Model

Build Corpus

Select Model

STM assumes a fixed user-specified number of topics. There is not a “right” answer to the number of topics that are appropriate for a given corpus (Grimmer and Stewart 2013), but the function searchK uses a data-driven approach to selecting the number of topics. The function will perform several automated tests to help choose the number of topics including calculating the held out likelihood (Wallach et al. 2009) and performing a residual analysis (Taddy 2012).

Run Model

I included the document source as a control for the topical topical prevalence, assuming that the distribution of topics depends on the sources. The number of topics is set to 80.

Results

library(stm)
library(tidyverse)
library(ggthemes)

rm(list = ls())
load("../output/models/finalmodel_60.RDa")

model_df <- model_df %>%
  dplyr::mutate(doc_index = as.numeric(rownames(.)),
         source = ifelse(source == "welt.de", "DIE WELT", source),
         source = ifelse(source == "zeit.de", "ZEIT ONLINE", source),
         source = ifelse(source == "focus.de", "FOCUS Online", source),
         source = ifelse(source == "bild.de", "Bild.de", source),
         source = ifelse(source == "spiegel.de", "SPIEGEL ONLINE", source),
         
         source = ifelse(source == "union", "Union", source),
         source = ifelse(source == "spd", "SPD", source),
         source = ifelse(source == "afd", "AfD", source),
         source = ifelse(source == "gruene", "Grüne", source),
         source = ifelse(source == "linke", "Linke", source),
         source = ifelse(source == "fdp", "FDP", source)
         )
model_df %>%
  ggplot(aes(source, fill=type)) +
  geom_bar(show.legend = F, alpha = 0.8) +
  coord_flip() +
  facet_wrap(~type, scales = "free") +
  labs(title = "Document distribution", y=NULL, x = NULL)

Label topics

To explore the words associated with each topic we use the words with the highest probability in each topic. As we included the source type (press release or news paper) as a control for the topical content (the word distribution of each topic), we have two different labels for each topic.

sagelabs <- sageLabels(stmOut)
topics.df <- as.data.frame(sagelabs$marginal$prob) %>%
  transmute(topic = as.numeric(rownames(.)),
            joint_label = paste( "Topic",topic, ":", V1,V2,V3,V4))

topics.df %>% select(joint_label) %>% 
  htmlTable::htmlTable(align="l", header = c("Topic Label"),
                       rnames = F)
Topic Label
Topic 1 : koalition grünen spd schwarz
Topic 2 : spd schulz nahles martin
Topic 3 : eu europa staaten usa
Topic 4 : the is new of
Topic 5 : diesel hendricks autos fahrverbote
Topic 6 : trump gipfel us hamburg
Topic 7 : deutschland merkel regierung land
Topic 8 : grünen csu union cdu
Topic 9 : afd partei petry fraktion
Topic 10 : cdu niedersachsen spd althusmann
Topic 11 : prozent spd afd umfrage
Topic 12 : euro union milliarden cdu
Topic 13 : fragen daten beck bundesregierung
Topic 14 : deutschland menschen millionen europa
Topic 15 : merkel kanzlerin angela bundeskanzlerin
Topic 16 : afd prozent facebook zdf
Topic 17 : merkel wahl bundestagswahl afd
Topic 18 : csu cdu merkel union
Topic 19 : spahn cdu jens präsidiumsmitglied
Topic 20 : fdp jamaika grünen lindner
Topic 21 : spd nahles frauen andrea
Topic 22 : schäuble wolfgang finanzminister spd
Topic 23 : kinder eltern familie rente
Topic 24 : kohl helmut kanzler kohls
Topic 25 : eu europa deutschland europäischen
Topic 26 : menschen welt deutschland politik
Topic 27 : grünen özdemir partei habeck
Topic 28 : afd glaser stiftung partei
Topic 29 : spd mecklenburg schwesig vorpommern
Topic 30 : spd koalitionsverhandlungen groko union
Topic 31 : pflege beck spd prozent
Topic 32 : schulz spd martin merkel
Topic 33 : ge wulff be ten
Topic 34 : jahr zahl deutschland bundesregierung
Topic 35 : palmer bosbach jutta maischberger
Topic 36 : spd koalition union groko
Topic 37 : cdu vw altmaier spd
Topic 38 : polizei berliner berlin prozent
Topic 39 : afd aufhebung staatsanwaltschaft frauke
Topic 40 : bundeswehr soldaten leyen nato
Topic 41 : bundestag afd fraktion abgeordneten
Topic 42 : gabriel sigmar außenminister deutschland
Topic 43 : hamburg polizei gipfel hamburger
Topic 44 : baden württemberg fahrt deutschland
Topic 45 : cdu schleswig günther alter
Topic 46 : gauland afd alexander özoguz
Topic 47 : spd bundestag abstimmung union
Topic 48 : aachen guttenberg wahlkampf parteien
Topic 49 : afd höcke antisemitismus bewegung
Topic 50 : scholz hamburg polizei stadt
Topic 51 : türkei deutschland erdogan bundesregierung
Topic 52 : afd partei hampel pazderski
Topic 53 : deutschland flüchtlinge familiennachzug menschen
Topic 54 : deutschland bildung bund schulen
Topic 55 : weidel afd alice spitzenkandidatin
Topic 56 : amri berliner sicherheit anschlag
Topic 57 : linke linken spd partei
Topic 58 : maizière hamburg innenminister deutschland
Topic 59 : seehofer csu söder horst
Topic 60 : schmidt glyphosat hendricks spd
theta <- as.data.frame(stmOut$theta) %>% # get all theta values for each document
  
  mutate(doc_index = as.numeric(rownames(.))) %>%
  # convert to long format
  gather(topic, theta, -doc_index) %>%
  mutate(topic = as.numeric(gsub("V","",topic))) %>%
  
  # join with topic df
  left_join(., topics.df, by="topic") %>%
  
  # join with model_df
  left_join(., model_df %>% 
              select(date,type,source,doc_index,title_text), by="doc_index")

Topic distribution

For each document, we have a distribution over all topics, e.g.:

sample_doc <- sample(nrow(model_df),1)

# uncomment this to only select docs from press releases
#sample_doc <- theta %>% filter(type=="press") %>% sample_n(1) %>% select(doc_index)
#sample_doc <- sample_doc$doc_index

title <- model_df$title[which(model_df$doc_index == sample_doc)]
source <- model_df$source[which(model_df$doc_index == sample_doc)]

theta %>%
  filter(doc_index == sample_doc) %>%
  select(doc_index, joint_label, theta) %>%
  ggplot(aes(joint_label, theta)) +
  geom_col(fill="#0099c6", alpha = 0.8) +
  ylim(c(0,1)) +
  coord_flip() +
  theme_hc() +
  labs(title = paste("Topic distribution of document",sample_doc),
       subtitle = paste0("Source: ",source,"\nTitle: ", title),
       x = NULL, y = NULL
       ) +
  theme(axis.text = element_text(size = 10))

What is the document acutally about?

model_df %>%
  filter(doc_index == sample_doc) %>%
  select(source, title_text) %>%
  htmlTable::htmlTable(align="l", rnames=FALSE, header = c("Source", "Title + Body"))
Source Title + Body
stern.de Kahrs: «Im Bund keine Option»: Rufe nach schärferer Absage an Rot-rot-grün in der SPD Kahrs: «Im Bund keine Option»: Rufe nach schärferer Absage an Rot-rot-grün in der SPD 22. Juli 2017 15:35 Uhr Kahrs: «Im Bund keine Option» Rufe nach schärferer Absage an Rot-rot-grün in der SPD Berlin - In der SPD werden zwei Monate vor der Bundestagswahl Rufe nach einer schärferen Absage an eine mögliche Koalition mit Beteiligung der Linkspartei laut. Fullscreen Haushaltsexperte Johannes Kahrs vertritt in der SPD den konservativen Seeheimer Flügel. Foto: Rainer Jensen © dpa-infocom GmbH In der SPD werden zwei Monate vor der Bundestagswahl Rufe nach einer schärferen Absage an eine mögliche Koalition mit Beteiligung der Linkspartei laut. «Rot-Rot-Grün ist in ostdeutschen Ländern möglich, für die Bundesebene ist es keine Option», sagte der Sprecher des konservativen Seeheimer Kreises in der SPD, , der «Welt am Sonntag». Die Linke habe die SPD als Hauptgegner und stehe für keine seriöse Außen- und Wirtschaftspolitik. Der SPD-Bundestagsabgeordnete Christian Flisek sagte der Zeitung: «Die Konflikte in einer Regierung aus SPD, Grünen und Linken wären derzeit immer noch zu groß, um Deutschland stabil zu regieren.» SPD-Bundesvize Ralf Stegner mahnte an, für eine möglichst starke SPD zu kämpfen, was Optionen eröffnen würde. «Koalitionsoptionen anzustreben oder auszuschließen, stärkt nur andere Parteien», schrieb der schleswig-holsteinische SPD-Landeschef am Samstag bei Twitter. Der Chef der SPD-Fraktion in Rheinland-Pfalz, Alexander Schweitzer, sagte der « Welt am Sonntag »: «Wenn eine Dreier-Konstellation notwendig wird, bin ich klar für Rot-Grün-Gelb.» In seinem Land funktioniere die Ampel wunderbar. dpa

Topic frequency

The expected proportion of the corpus that belongs to each topic is used to get an initial overview of the results. The figure below displays the topics ordered by their expected frequency across the corpus. The four most frequent words in each topic are used as a label for that topic.

overall_freq <- as.data.frame(colMeans(stmOut$theta)) %>%
  transmute(
    topic = as.numeric(rownames(.)),
    frequency = colMeans(stmOut$theta)
         ) %>%
  left_join(., topics.df, by = "topic") %>% 
  arrange(desc(frequency))%>%
  mutate(order = row_number())
overall_freq %>%
  ggplot(aes(reorder(joint_label, -order), 
             frequency, fill=frequency)) +
  geom_col(show.legend = F) +
  coord_flip() +
  scale_fill_gradient(low = "#00AFBB", high = "#FC4E07") +
  labs(x=NULL, y=NULL) 

#ggsave("../figs/topic_proportion.png", height = 6, width = 4)

Measure Agendas

Agendas were measured in terms of percentage distributions across the 80 topics. For each source the average distribution of each topic is calculated for each month. The following pictures show the overall topic distribution.

# calculate topic mean by source and month
topicmean <- theta %>%
  mutate(
    year = lubridate::year(date),
    month = lubridate::month(date)
    ) %>%
  group_by(topic,source, month, year) %>%
  dplyr::summarise(topicmean = mean(theta)) %>% 
  ungroup() %>%
  spread(source, topicmean) %>%
  filter(month != 3)
topicmean_news <- theta %>%
  filter(type == "news") %>%
  group_by(topic,joint_label, source) %>%
  summarise(topicmean = mean(theta)) %>% 
  ungroup()

topicmean_press <- theta %>%
  filter(type == "press") %>%
  group_by(topic,joint_label, source) %>%
  summarise(topicmean = mean(theta)) %>% 
  ungroup()
topicmean_news %>%
  group_by(source) %>%
  arrange(desc(topicmean), .by_group = TRUE) %>%
  mutate(topic_order = row_number()) %>%
  ungroup() %>%
  
  group_by(joint_label) %>%
  mutate(topicmean_mean = mean(topicmean)) %>%
  ungroup() %>%
  top_n(70, topicmean_mean) %>%
  
  ggplot(aes(reorder(joint_label, topicmean_mean),
             topicmean, label = topic_order,
             fill = topic_order)) +
  geom_col(show.legend = F) +
  geom_text(hjust=-0.1, size=5) +
  coord_flip() +
  scale_fill_gradient(low = "#00AFBB", high = "#FC4E07") +
  facet_wrap(~source, nrow = 1) +
  labs(x=NULL, y=NULL) +
  theme(axis.text.y = element_text(size=12))

topicmean_press %>%
  group_by(source) %>%
  arrange(desc(topicmean), .by_group = TRUE) %>%
  mutate(topic_order = row_number()) %>%
  ungroup() %>%
  
  group_by(joint_label) %>%
  mutate(topicmean_mean = mean(topicmean)) %>%
  ungroup() %>%
  top_n(50, topicmean_mean) %>%
  
  ggplot(aes(reorder(joint_label, topicmean_mean),
             topicmean, label = topic_order,
             fill=topic_order)) +
  geom_col(show.legend = F) +
  geom_text(hjust=-0.1, size=5) +
  coord_flip() +
  scale_fill_gradient(low = "#00AFBB", high = "#FC4E07") +
  facet_wrap(~source, nrow = 1) +
  labs(x=NULL, y=NULL) +
  theme(axis.text.y = element_text(size=12))

Correlation of topic prevalence

Then, we estimated bivariate correlations between party agendas and the mediated party agendas in the online news. These correlations represent the agenda selectivity each party experiences in each media outlet. The higher the correlation, the more congruent both agendas are.

media <- unique(model_df %>% filter(type == "news") %>% select(source))
parties <- unique(model_df %>% filter(type == "press") %>% select(source))
  
rm(corrDF)
for (i in parties$source) {
  
  tempdf <- topicmean %>%
    group_by(month, year) %>%
    do(data.frame(Cor=t(cor(.[,media$source], .[,i])))) %>%
    gather(medium, cor, 3:9) %>%
    mutate(party = i,
           medium = gsub("Cor.","",medium)) %>%
    ungroup()
  
  if (exists("corrDF")){
    corrDF <- rbind(corrDF,tempdf)
  } else {
    corrDF <- tempdf
  }
  
}

agenda <- corrDF %>% 
  mutate(date = as.Date(paste0(year,"/",month,"/1"))) %>%
  dplyr::mutate(medium = ifelse(medium == "DIE.WELT", "DIE WELT", medium),
                medium = ifelse(medium ==  "ZEIT.ONLINE", "ZEIT ONLINE", medium),
                medium = ifelse(medium == "FOCUS.Online", "FOCUS Online", medium),
                medium = ifelse(medium == "SPIEGEL.ONLINE", "SPIEGEL ONLINE", medium)
  )
normalize_data <- function(x) {
  # normalize data between -1,1
  if (is.numeric(x)) {
    y <- 2*((x - min(x, na.rm = T)) / (max(x, na.rm = T) - min(x, na.rm = T)))-1
    return(y)
  } else {
    return(x)
  }

}
p <- agenda %>%
  mutate(
    date =as.Date(paste("01",month,year, sep = "-"), format="%d-%m-%Y")
  ) %>%
  ggplot(aes(date, cor, color = medium)) +
  geom_line(show.legend = F) +
  geom_hline(yintercept = 0, size = 0.3, color = "grey30", linetype = 2) +
  facet_wrap(~party) +
  labs(y=NULL, x =NULL) 
  # guides(colour = guide_legend(nrow = 1)) +
  # theme(legend.position = "bottom",
  #       legend.title = element_blank())

plotly::ggplotly(p, tooltip=c("cor","medium"))

Correlation of topic prevalence - grouped by party & medium

agenda %>%
  group_by(party, medium) %>%
  summarize(cor = mean(cor, na.rm = T)) %>%
  spread(key = party, value = cor) %>%
  ggiraphExtra::ggRadar(aes(color = medium),
                        interactive = T,
                        alpha = 0,
                        rescale = F,
                        legend.position = "bottom")